Factors Affecting neonatal Mortality

survival analysis

clinical trials

GLMS

Author

Bongani Ncube

Published

27 January 2024

knitr::opts_chunk$set(cache = TRUE, 
                      echo = TRUE, 
                      message = FALSE, 
                      warning = FALSE,
                      fig.path = "static",
                      fig.height=6, 
                      fig.width = 1.777777*6,
                      fig.align='center',
                      tidy = FALSE, 
                      comment = NA, 
                      highlight = TRUE, 
                      prompt = FALSE, 
                      crop = TRUE,
                      comment = "#>",
                      collapse = TRUE)
knitr::opts_knit$set(width = 60)
library(tidyverse)
library(reshape2)
theme_set(theme_light(base_size = 16))
make_latex_decorator <- function(output, otherwise) {
  function() {
      if (knitr:::is_latex_output()) output else otherwise
  }
}
insert_pause <- make_latex_decorator(". . .", "\n")
insert_slide_break <- make_latex_decorator("----", "\n")
insert_inc_bullet <- make_latex_decorator("> *", "*")
insert_html_math <- make_latex_decorator("", "$$")

Introduction

Neonatals death refers to the death of newborns before they turn 28 days old (Usman et al.,2019), and can be categorized as early or late depending on whether the death occurred before or after the seventh day of life. The period after birth is widely regarded as the most vulnerable and high-risk time for newborns. This is due to the fact that mortality and morbidity rates are significantly elevated during this period. Studies indicate that over 50% of child deaths occur as early as the first two days of birth, while over 75% occur within the first week (Carlo & Travers, 2016). Some researchers argue that time with the highest chances of neonatal death is within the first 24 hours of life (Nouri et al., 2013). Globally, neonatal mortality accounts for about 44% of child mortality and over 60% of infant mortality, resulting in approximately 2.8 million deaths each year (UNICEF,2014).\

The objectives of the study are to:

To compare survival times of neonates with regards to different categorical variables.
To understand the factors that greatly affect the mortality of newborn babies
To contrast and select the best model between the Cox proportional hazards model and parametric model
To compare and select the best model among the Accelerated Failure time models in modeling survival of neonates.
To choose the overall best fit model between the Cox proportional Hazards ,parametric models and the AFT models.

General survival analysis ideas

Survival analysis deals with the study of time till a certain event of interest occurs.

Censoring

Censoring in survival analysis refers to the situation where the outcome(e.g., death, hospitalization, etc.) has not occurred for some subjects by the end of the study. This can occur for various reasons, such as loss to follow-up, withdrawal from the study, or the occurrence of another event that renders the subject no longer at risk for the event of interest. The common types of censoring include :

left sensoring
right sensoring
interval censoring

Because of censoring,we can not use the usual ordinay statistical methods because we can not compute the descriptive statistics like sample mean,standard deviation or even peform regression analysis.

Survival function

Survival analysis typically involves using T as the response variable, as it represents the time till an outcome occurs, while t refers to the actual survival time of an individual. The distribution function F(t) is used to describe chances of time of survival being less than a given value, t. In some cases, the cumulative distribution function is of interest and can be obtained as follows:

\[\begin{equation} F(t)=P(T\le t)=\int_{t}^{\infty}f(u) \end{equation}\] Interest also tends to focus on the survival or survivorship function which in this study is the chances of a neonate surviving after time \(t\) and also given by the formula

\[\begin{equation} S(t) = P(T > t)=\int_{t}^{\infty} f(u) du \end{equation}\]

\(S(t)\) monotonically decreasing function with \(S(0) = 1\) and \(S(\infty) = \text{lim}_{t \rightarrow \infty} S(t) = 0\).

Suppose we have \(R (t)\) (set at risk) that has neonates who are prone to dying by time t, the chances of a neonate in that set to die in a given interval \([t, t + \delta t)\) is given by the following forrmula, where \(h(t)\) is the hazard rate and is defined as: \[\begin{equation} h(t) = \lim_{\delta t \rightarrow 0} \frac{P(t \le T < t+\delta t | T \ge t)}{\delta t} \end{equation}\] The hazard function’s shape can be any strictly positive function and it changes based on the specific survival data provided.

Relationship between h(t) and S(t)

It is given that \(f(t) = - \frac{d}{dt} S(t)\) hence the equation above can be simplified to:

\[\begin{equation} h(t)=\frac{f(t)}{S(t)} =\frac{- \frac{d}{dt} S(t)}{S(t)}= - \frac{d}{dt} [\log S(t)] \end{equation}\]

Taking intergrals both sides the above reduces to: \[H(t)= -\log S(t)\]

where \(H(t)=\int_{t}^{t} h(u) du\) ,is the cumulative risk of a neonate dying by time t. Since the event is death, then H(t) summarises the risk of death up to time t, assuming that death has not occurred before t. The survival function is also approximately simplified as follows. [ S(t) = e^{-H(t)} -H(t) ]

Non-parametric methods

Estimates of the hazard and survival function are often used to summarize survival data and such methods are often referred to as Non-parametric methods of estimation. More detailed information such as the median and percentiles can be taken from the estimated survival function thereby giving more profound analysis (Cleves et al.,2008, Hanagal, 2011).

Empirical Survival Function

Assuming no censoring in the data points as well as no tied observations in the data. The empirical function of survival is used to give an estimate of S(t) in the absence of censoring and is given by:

\[\begin{align*} \hat{S}(t) = \frac{no~ of~neonates~ with~T \ge t} {total number of neonates} \end{align*}\]

Kaplan-Meier estimate

If there are \(r\) times of deaths among neonates in ICU, such that \(r\leq n\) and \(n\) is the number of neonates with observed survival times. Arranging these times in from the smallest to the largest,the \(j\)-th death time is noted by \(t_{(j)}\) , for \(j = 1, 2, \cdots , r\), such that the \(r\) death times ordred are \[ t_{1}<t_{2}< \cdots < t_{m}\] . Let \(n_j\), for \(j = 1, 2, \ldots, r\), denote the number of neonates who are still alive before time \(t(j)\), including the ones that are expected to die at that time. Additionally, let \(d_j\) denote the number of neonates who die at time \(t(j)\). The estimate for the product limit can be expressed in the following form:

\[\begin{equation} \hat{S}(t) = \prod_{j=1}^k \, \left(1-\frac{d_j}{r_j}\right) \end{equation}\]

To survive until time \(t\), a neonate must start by surviving till \(t(1)\), and then continue to survive until each subsequent time point \(t(2), \ldots, t(r)\). It is assumed that there are no deaths between two consecutive time points \(t(j-1)\) and \(t(j)\). The conditional probability of surviving at time \(t(i)\), given that the neonate has managed to survive up to \(t(i-1)\), is the complement of the proportion of the new born babies who die at time \(t(i)\), denoted by \(d_i\), to the new born babies who are alive just before \(t(i)\), denoted by \(n_i\). Therefore, the probability that is conditional of surviving at time \(t(i)\) is given by \(1 - \frac{d_i}{n_i}\). The variance of the estimator is given by \[\begin{equation} \text{var}(\hat{S}(t)) = [\hat{S}(t)]^2 \sum_{j=1}^k \frac{d_j}{(r_j-d_j) r_j} \end{equation}\] The corresponding standard deviation is therefore given by: \[\begin{equation} \text{s.e}(\hat{S}(t)) = [\hat{S}(t)]\left[\sum_{j=1}^k \frac{d_j}{(r_j-d_j) r_j}\right]^{\frac{1}{2}} \end{equation}\]

Log rank test

Given the problem of contrasting two or more survival functions,for instance survival times between male and female neonates in the ICU. Let \[ t_{1}<t_{2}< \cdots < t_{r}\]

Let \(t_i\) be the unique times of death observed among the neonates, found by joining all groups of interest. For group \(j\),\ let \(d_{ij}\) denote the number of deaths at time \(t_i\), and\ let \(n_{ij}\) denote the number at risk at time \(t_i\) in that group. Let \(d_i\) denote the total number of deaths at time \(t_i\) across all groups, and\

the expected number of deaths in group \(j\) at time \(t_i\) is: ,

\[E(d_{ij}) = \frac{n_{ij} \, d_i}{n_i} = n_{ij} \left(\frac{d_i}{n_i}\right) \]

At time \(t_i\), suppose there are \(d_i\) deaths and \(n_i\) individuals at risk in total, with \(d_{i0}\) and \(d_{i1}\) deaths and \(n_{i0}\) and \(n_{i1}\) individuals at risk in groups 0 and 1, respectively, such that \(d_{i0} + d_{i1} = d_i\) and \(n_{i0} + n_{i1} = n_i\). Such data can be summarized in a \(2 \times 2\) table at each death time \(t_i\).

Given that the null hypothesis is true, the number of deaths is expected to follow the hypergeometric distribution, and therefore: \[\begin{equation} E(d_{i0}) =e_{i0}= \frac{n_{i0} \, d_i}{n_i} = n_{i0} \left(\frac{d_i}{n_i}\right) \end{equation}\]

\[\begin{equation} \text{Var}(d_{i0}) = \frac{n_{i0} \, n_{i1} \, d_i(n_i-d_i)}{n_{i}^2 (n_{i}-1)} \end{equation}\] the statistic \(U_L\) is given by \[U_L=\sum^r_{i=1}(d_{i0}-e_{i0})\] so that the variance of \(U_L\) is \[Var(U_L)=\sum^r_{j=1}v_{i0}=V_L\] which then follows that : \[\frac{U_L}{\sqrt{V_L}} \sim N(0,1)\] and squaring results in \[\frac{U^2_L}{V_L}\sim \chi^2_1\]

Parametric survival distributions

Some of the most used parametric methods include the following:

Regression models in Survival analysis

Cox’s Proportional Hazards

The model is semi-parametric in nature since no assumption about a probabilty distribution is made for the survival times. The baseline model can take any form and the covariate enter the model linearly (Fox,2008). The model can thus be represented as :

\[\begin{equation} h(t|X) = h_0(t) \exp(\beta_1 X_{1} + \cdots + \beta_p X_{p})=h_0(t)\text{exp}^{(\beta^{T} X)} \end{equation}\]

In the above equation, \(h_0(t)\) is known as the baseline hazard function, which represents the hazard function for a neonate when all the included variables in the model are zero. On the other hand, \(h(t|X)\) represents the hazard of a neonate at time \(t\) with a covariate vector \(X=(x_1,x_2,\ldots,x_p)\) and \(\beta^T=(\beta_1,\beta_2,\ldots,\beta_p)\) is a vector of regression coefficients.

Fitting a proportional hazards model

The partial likelihood

Given that \(t(1) < t(2) < \ldots < t(r)\) denote the \(r\) ordered death times. The partial likelihood is given by :

\[\begin{equation} L(\beta) = \prod_{i=1}^{n} \left[\frac{\text{exp}^{\beta ^{T} X_i}}{ \sum_{\ell\in R(t_{(i)}}\text{exp}^{\beta ^{T} X_\ell}}\right]^{\delta_i} \end{equation}\] Maximising the logarithmic of the likelihood function is more easier and computationally efficient and this is generally given by:

\[\begin{equation} \log L(\beta)= \sum_{i=1}^{n} \delta_i \left[ \beta ^{T} X_i - \log \left(\sum_{\ell \in R(t_{(i)}}\text{exp}^{\beta ^{T} X_\ell}\right) \right] \end{equation}\]

Inference on the hazard ratios

Testing the conjecture that \(H_0 :\mathbf{\beta}=\mathbf{\beta_0}\) against \(H_a :\mathbf{\beta}\neq \mathbf{\beta_0}\) . Where \(\mathbf{\beta_0}\) is known,then

The likelihood ratio test is given by

\[T=2 \left[\log(L(\hat{\mathbf{\beta}})) - \log(L(\mathbf{\beta_0}))\right]\] Where given the null hypothesis is true \(T\) has a of \(\chi^2_{(p)}\) and we can find the \(p\)-value at tail probablity of the \(\chi^2_{(p)}\) distribution as \(P(\chi^2_{(p)}\geq T)\)

The wald test statistic has a quadratic form : \[W=(\mathbf{\hat{\beta}}-\mathbf{\beta})^{T}\mathbf{I(\hat{\beta})}(\mathbf{\hat{\beta}-\beta})\] with \[\text{s.e}(\hat{\beta}=\frac{1}{\sqrt{I(\hat{\beta})}}\]

The test statistic for the Wald test has a \(\chi^2\) distribution. To conduct a one-sided test at the \(\alpha\) level of significance, we compare the value of the test statistic, denoted as \(W\), to the chi-square value with \(1 - \alpha\) degrees of freedom.

\[\mathbf{\hat{\beta}} \sim N[\mathbf{\beta},I^{-1}(\mathbf{\hat{\beta}})]\]

According to Dobson (2002) ,the statistic

\[Z^{\*}_i=\frac{\hat{\beta_i}}{\text{s.e}(\hat{\beta_i})}\sim N(0,1)\] is called the wald statistic with a null hypthothesis \[H_0:\beta_i=0\]

Residuals for Cox Regression Model

Six types of residuals are studied in this thesis: Cox-Snell, Modified Cox-Snell, Schoenfeld, Martingale, Deviance, and Score residuals.

Cox-Snell ResidualS

Cox-Snell definition of residuals given by Cox and Snell (1968) is the most common residual in survival analysis. The Cox-Snell residual for the \(ith\) individual, \(i = 1, 2, \cdots , n\), is given by

\[\begin{equation} r_{Ci}=\text{exp}(\mathbf{\hat{\beta}^{T} x_i})\hat{H}_0(t_i) \end{equation}\] where \(\hat{H}_0(t_i)\) is an estimate of the baseline cumulative hazard function at time \(t_i\). It is sometimes called the Nelson-Aalen estimator.

Martingale residuals

These are useful in determing the functional form of the covariates to be included in the model. If the test reveals that the covariate can not be included linearly then there is need for such a covariate to be transformed . The martingale residuals are represented in the following form

\[\begin{equation} r_{Mi}=\delta_i - r_{Ci} \end{equation}\]

Deviance Residuals

The skew range of Martingale residual makes it difficult to use it to detect outliers. The deviance residuals are much more symmetrically distributed about zero and are defined by \[\begin{equation} r_{Di}=\text{sgn}(r_{Mi})\left[-2\{r_{Mi}+\delta_i\log\left(\delta_i-r_{Mi}\right)\}\right]^{\frac{1}{2}} \end{equation}\]

Here, \(r_{Mi}\) represents the martingale residual for the \(i\)-th individual, and the function \(\text{sgn}(\cdot)\) denotes the sign function. Specifically, the function takes the value +1 if its argument is positive and -1 if negative.

Schoenfield residuals

Schoenfeld residuals are a useful tool for examining and testing the proportional hazard assumption, detecting leverage points, and identifying outliers. They can be calculated as follows: \[\begin{equation} r_{Sji}=\delta_i\{x_{ji}-\hat{a}_{ji}\} \end{equation}\] where \(x_{ji}\) is the value of the \(jth\) explanatory variable,\(j = 1, 2, \cdots , p\), for the \(ith\) individual in the study,

\[\begin{equation} \hat{a}_{ji}=\frac{\sum_{\ell \in R(t_i)}x_{j\ell}\text{exp}(\mathbf{\hat{\beta}^{T} x_{\ell}})}{\sum_{\ell \in R(t_i)}\text{exp}(\mathbf{\hat{\beta}^{T} x_{\ell}})} \end{equation}\]

Parametric Proportional Hazard Models

The Cox regression model is often used in survival data analysis as it does not require a specific probability distribution assumption for the survival times. However, assuming a particular probability distribution can improve the precision of inferences, such as estimates of relative hazards and median survival times. Parametric models assume a specific probability distribution for the survival times.\

Weibull PH model

The hazard and the survival function of the Weibull distribution are given by,\ \(h(t) = \gamma \lambda t^{\gamma-1}\) and \(S(t) = e^{-\lambda t^\gamma}\) respectively. The hazard function of a neonate with covariates \(X_1,X_2,X_3,\cdots X_p\) is given by \[\begin{equation} h(t|X_i) =\gamma \lambda t^{\gamma-1} \exp(\beta_1 X_{i1} + \cdots + \beta_p X_{ip})=\gamma \lambda t^{\gamma-1}\text{exp}^{(\beta^{T} X)} \end{equation}\] making use of equation (\(\ref{cox}\)). The corresponding survival function is given by: \[\begin{equation} S_i(t)=\text{exp}\{\exp(\beta^{T} x_i)\lambda t^{\gamma}\} \end{equation}\] The model is filtted using the likelihood function of the \(n\) observations and maximizing this function with respect to unknown parameters \((\beta_1,\beta_2,\beta_3,\cdots,\beta_p),\lambda\) and \(\gamma\) where the likelihood function for the hazard and survivor function is given by : \[\begin{equation} L(\alpha,\mu,\sigma)=\prod_{i=1}^{n}\{h_i(t_i)\}^{\sigma_i}S_i(t_i) \end{equation}\] Taking logarithms in equation (\(\ref{parm}\)) we get \[\begin{equation} \log L(\alpha,\mu,\sigma)=\sum^{n}_{i=1}\{\sigma_i \log h_i(t_i)+\log S_i(t_i)\} \end{equation}\] Now substituting (\(\ref{gam}\)) and (\(\ref{parm}\)) and simplyfying we get \[\begin{equation} \log L(\alpha,\mu,\sigma)=\sum^n_{i=1}[\sigma_i\{\beta` x_i+\log(\lambda\gamma)+\gamma\log (t_i)\} -\lambda\text{exp}(\beta` x_i)t^{\gamma} \end{equation}\]

The Exponential PH model

The survival function and hazard function are given by \(S(t) = e^{-\lambda t}\) and \(h(t)=\sigma\) respectively. Evaluating equation (\(\ref{cox}\)) ,the hazard function for the exponential PH model of a particular neonate is given by
\[\begin{equation} h(t|X_i) =\lambda \text{exp}(\beta_1 X_{i1} + \cdots + \beta_p X_{ip})=\lambda \text{exp}^{(\beta^{T} X)} \end{equation}\]

Accelerated Failure Time Models

These models encompass a broader range of survival time distributions and may be more useful in situations where the proportional hazards assumption is not valid. Models based on this general family can prove to be fruitful. Under the accelerated failure time (AFT) models, we measure the direct impact of the explanatory variables on the survival time, instead of the hazard as we do in proportional hazards models. This approach facilitates the interpretation of results since the parameters quantify the impact of the corresponding covariate on the mean survival time. According to the general AFT model, the hazard function of the \(i\)-th neonate at time \(t\), \(h_i(t)\), is such that: \[\begin{equation} h_i(t)=e^{-\phi_i}h_0(t/e^{-\phi_i}) \end{equation}\] The acceleration factor\(\phi_i\) is given according to the formula \[\phi_i =\text{exp}(\alpha_1x_{1i}+\alpha_2x_{2i}+ \cdots +\alpha_px_{pi}\] where \(x_{ji}\) is the value of the \(jth\) explanatory variable,\(X_j\),(\(j=1,2,3,\cdots,p\)),for the \(ith\) individual,(\(i=1,2,3,\cdots,n\)). For a group of neonates with with covariates \((X_1,X_2,\cdots,X_p)\),The survival function for the \(ith\) individual is given mathematically as \[\begin{equation} S_i(t|x) = S_0(t/\phi_i) \end{equation}\] Where \(S_0(t)\) is the baseline survival function and \(\phi\) is an accelerated factor implying a ratio of survival times corresponding to any fixed value of \(S(t)\). Parametric accelerated failure time models are unified by the adoption of a log-linear representation of the model and this form of the model is adopted by most computer software packages for accelerated failure time modelling. The log-linear model for the random variable \(T_i\) , associated with the lifetime of the \(ith\) individual in a survival study is given by

\[\begin{equation} \log T_i=\mu +\alpha_1x_{1i}+\alpha_2x_{2i}+ \cdots +\alpha_px_{pi}+\sigma\epsilon_i \end{equation}\] In this model, \(\alpha = (\alpha_1, \alpha_2, \ldots, \alpha_p)\) are the unknown coefficients of the explanatory variables \(X_1, X_2, \ldots, X_p\), with values \((x_{1i}, x_{2i}, \ldots, x_{pi})\) for the \(i\)-th individual. \(\mu\) and \(\sigma\) are two additional parameters known as the intercept and scale parameter, respectively. The quantity \(\sigma_i\) is a random variable used to model the deviation of the values of \(\log(T_i)\) from the linear component of the model. \(\sigma\) is a scale parameter and an error term typically expressed using a normal, logistic, or standard Gumbel distribution. The specific case of a standard Gumbel distribution for \(\sigma\) implies a conditional (on \(\beta\) and \(\sigma\)) Weibull survival model for \(T\) with shape parameter \(\alpha = 1/\alpha\) and scale parameter \(\lambda(\beta, \sigma) = \exp(-x^T\beta/\sigma)\) (Christensen et al., 2011). The \(\alpha\)-parameters reflect the impact of each explanatory variable on the survival times; positive values indicate that the survival time increases with increasing values of the explanatory variable, and vice versa. It can be shown that the survivor function for the \(i\)-th individual in an accelerated failure time model is given by: \[\begin{equation} S_i(t)=S_0\{t/\text{exp}(\alpha'x_i)\} \end{equation}\] The corresponding hazard function is also given by \[\begin{equation} h_i(t)=\text{exp}(\alpha'x_i)h_0\{t/\text{exp}(\alpha'x_i)\} \end{equation}\]

Weibull AFT model

Assuming that the survival times follow a Weibull distribution with scale parameter \(\lambda\) and shape parameter \(\gamma\) so that the baseline hazard as given by \(h(t) = \gamma \lambda t^{\gamma-1}\),the hazard function for the \(ith\) individual is then, from Equation (\(\ref{AFTS}\)), given by \[\begin{equation} h_i(t)=e^{-\phi_i}\lambda\gamma(e^{\phi_it})^{\gamma-1}=(e^{-\phi_i})^{\gamma}\lambda\gamma t^{\gamma-1} \end{equation}\] Implying that the survival time of the individual has a weibull distribution with scale parameter \(\lambda e^{-\gamma\phi_i}\) and shape parameter \(\gamma\). This property of the weibull distribution is called accelerated failure time property. The survivor and hazard function under the log-linear is given by

\[\begin{equation} S_i(t)=\text{exp}\left\{-\text{exp}\left(\frac{\log t -\mu -\phi_i}{\sigma}\right)\right\} \end{equation}\] and \[\begin{equation} h_i(t)=\frac{1}{\sigma t}\text{exp}\left(\frac{\log t -\mu -\phi_i}{\sigma}\right) \end{equation}\]

log-logistic AFT model

Suppose the suvival times follow a log-logistic distribution with a baseline hazard of \(h(t)=\frac{\exp^{\theta}kt^{k-1}}{1+\text{exp}^{\theta}t^k}\) ,it can be shown that the hazard function for the \(ith\) individual is then, from Equation (\(\ref{AFTS}\)), given by \[\begin{equation} h_i(t)=\frac{e^{\theta-k\phi_i}kt^{k-1}}{1+e^{\theta-k\phi_i}t^k} \end{equation}\] It then follows that the survival time for the ith individual also has a log-logistic distribution with parameters \(\theta- k\phi_i\) and \(k\). The log-linear form of the accelerated failure time model provides a representation of the log-logistic such that the survivor function of \(T_i\) is \[\begin{equation} S_i(t)=\left\{1+\text{exp}\left(\frac{\log t -\mu-\phi_i}{\sigma}\right)\right\}^{-1} \end{equation}\] where \(\phi_i=\alpha_1x_{1i}+\alpha_2x_{2i}+\cdots +\alpha_px_{pi}\) and corresponding hazard function given by \[\begin{equation} h_i(t)=\frac{1}{\sigma t}\left\{1+\text{exp}\left[-\left(\frac{\log t -\mu-\phi_i}{\sigma}\right)\right]\right\}^{-1} \end{equation}\]

Lognormal AFT model

The baseline survival function for a model that is assumed to have a logormal distribution is given below. survivor function is given by \[S_0(t)=1-\Phi\left(\frac{\log t- \mu}{\sigma}\right)\] The corresponding survivor function for the \(ith\) individual under the accelerated failure time model,is then \[S_i(t)=S_0(e^{-\phi_i}t)\] where \(\phi_i=\alpha_1x_{1i}+\alpha_2x_{2i}+\cdots +\alpha_px_{pi}\) is a linear combination of the values of \(p\) explanatory variables for the \(ith\) individual. This implies that the corresponding survivor function is given by \[\begin{equation} S_i(t) = 1- \Phi\left(\frac{\log t -\phi_i-\mu}{\sigma}\right) \end{equation}\] this is the survival function for an individual whose survival times have a lognormal distribution with parameters \(\mu +\phi_i\) and \(\sigma\). Implying that this model has the accelerated failure time property.

Model Checking

Akaike Information Criterion

Contrasting between a number of possible models, which can be unnested, can also be made on the basis of Akaike’s information criterion, given by \[AIC = -2logL + 2q\] in which \(q\) is the number of unknown \(\beta\)-parameters in the model.

Bayesian Information Criterion

An alternative to the AIC statistic is the Bayesian Information Criterion, given by \[BIC = -2logL +qlogd\], where \(q\) is the number of unknown parameters in the fitted model and d is the number of uncensored observations in the data set.

Variable selection

Stepwise regression

The method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some pre- specified criterion. The aim of stepwise regression is to identify the most important predictors and to reduce the complexity of the model.There are three types of stepwise regression which are forward Selection and backward Elimination.

Backward elimination

This is the opposite of forward selection. It is a stepwise regression that begins with all of the variables in the model and then removes them one at a time until only those that are statistically significant remain. This technique can be used to reduce over-fitting and improve model accuracy.

data wrangling


out_new<-df |> 
  mutate(Maternal_age= case_when(AGE==1~"15-25",
                                 AGE==2~"25-29",
                                 AGE==4~"30-39",
                                 AGE==3~"40+"),
         Education=case_when(Education==3~"Primary",
                             Education==5~"Tertiary",
                             Education==2~'None',
                             Education==1~"None",
                             Education==4~"Secondary"),
         'Marital_Status'=case_when(`Marital Status`==3~"Cohabiting",
                                    `Marital Status`==1~"Married",
                                    `Marital Status`==2~"Single",
                                    `Marital Status`==4~"other"),
         'husband_employed'=case_when(`husband employed`==2~"Not employed",
                                      `husband employed`==1~"employed"),
         Reffered= case_when(Reffered==2~"Not reffered",
                             Reffered==1~"reffered"),
         'gestational_age'=case_when(`gestational age`==0~"above 32",
                                     `gestational age`==1~"below 32"),
         Mode= case_when(Mode==2~"C section",
                         Mode==1~"NVD",
                         Mode==3~"other",
                         Mode==4~"other"),
         Booking=case_when(Booking==1~"Booked",
                           Booking==2~"Not Booked",
                           Booking==3~"Not Booked"),
        BirthWeight=case_when(BirthWeight==1~"<1.5Kg",
                              BirthWeight==0~"1.5-2.5Kg"),
                              
        gender=case_when(gender==1~"Male",
                         gender==2~"Female"),
        HIE= case_when(HIE==1~"yes",
                       HIE==0~"no"),
         COAF=ifelse(COAF==1,"clear/yellow","green/brown"),
        AMO=ifelse(A.F.O==1,"foul","not foul"),
        preterm_rapture=ifelse(Preterm==1,"yes","no"),
        Birth_attended =case_when(B.Attended==2~"Nurse/Midwife",
                                    B.Attended==1~"Doctor",
                                    B.Attended==3~"Traditional"),
        Parity=case_when(Parity==0|Parity>2~"High Risk",
                         Parity==1|Parity==2~"adequate"),
        status=ifelse(DEATH==1,"DIED","SURVIVED"))


out_new <- out_new |> 
  mutate_if(is.character,as.factor) |> 
  rename(gestAge="gestational age")
DT::datatable(out_new,filter = "top")

Comments

lookas like our dataset was well balanced

Demographics

Characteristic	DIED, N = 200	SURVIVED, N = 200	Test Statistic	p-value¹
husband_employed, n / N (%)			0.66	0.42
employed	88 / 200 (44%)	80 / 200 (40%)
Not employed	112 / 200 (56%)	120 / 200 (60%)
Birth tabattended, n / N (%)			9.2	0.010
Doctor	37 / 200 (19%)	49 / 200 (25%)
Nurse/Midwife	158 / 200 (79%)	135 / 200 (68%)
Traditional	5 / 200 (2.5%)	16 / 200 (8.0%)
Marital_Status, n / N (%)			16	0.001
Cohabiting	104 / 200 (52%)	108 / 200 (54%)
Married	48 / 200 (24%)	55 / 200 (28%)
other	30 / 200 (15%)	8 / 200 (4.0%)
Single	18 / 200 (9.0%)	29 / 200 (15%)
Reffered, n / N (%)			2.6	0.11
Not reffered	156 / 200 (78%)	142 / 200 (71%)
reffered	44 / 200 (22%)	58 / 200 (29%)
Education, n / N (%)			74	<0.001
None	52 / 200 (26%)	55 / 200 (28%)
Primary	61 / 200 (31%)	86 / 200 (43%)
Secondary	12 / 200 (6.0%)	48 / 200 (24%)
Tertiary	75 / 200 (38%)	11 / 200 (5.5%)
Booking, n / N (%)			107	<0.001
Booked	79 / 200 (40%)	178 / 200 (89%)
Not Booked	121 / 200 (61%)	22 / 200 (11%)
¹ Pearson’s Chi-squared test

Comments

The table above presents the descriptive statistics of the variables of interest.
It reveals that out of the 400 neonates sampled for the study, 200 of them survived and 200 died.
The preliminary analysis using Pearson chi-square showed that booking status and Education level of the mother all had a \(p<0.001\) (\(<0.001\)) , Marital status of the mother and birth attendance also had significant \(p-value\) given by \(( \chi^2=16,p-value=0.001)\) and \(( \chi^2=9.2,p-value=0.01)\) respectively implying that the variables had significant association with the neonate’s status(died or survived).
Employment status of the father and the variable reffered were seen not to be having significant association with the neonate’s status at \(5\%\) level of significance.

Clinical factors

Characteristic	DIED, N = 200	SURVIVED, N = 200	Test Statistic	p-value¹
Mode, n / N (%)			89	<0.001
C section	62 / 200 (31%)	110 / 200 (55%)
NVD	59 / 200 (30%)	87 / 200 (44%)
other	79 / 200 (40%)	3 / 200 (1.5%)
gestational_age, n / N (%)			19	<0.001
above 32	43 / 200 (22%)	83 / 200 (42%)
below 32	157 / 200 (79%)	117 / 200 (59%)
COAF, n / N (%)			14	<0.001
clear/yellow	129 / 200 (65%)	92 / 200 (46%)
green/brown	71 / 200 (36%)	108 / 200 (54%)
AMO, n / N (%)			0.16	0.69
foul	114 / 200 (57%)	118 / 200 (59%)
not foul	86 / 200 (43%)	82 / 200 (41%)
Parity, n / N (%)			0.01	0.92
adequate	80 / 200 (40%)	81 / 200 (41%)
High Risk	120 / 200 (60%)	119 / 200 (60%)
BirthWeight, n / N (%)			112	<0.001
<1.5Kg	117 / 200 (59%)	17 / 200 (8.5%)
1.5-2.5Kg	83 / 200 (42%)	183 / 200 (92%)
Maternal age, n / N (%)			21	<0.001
15-25	31 / 191 (16%)	46 / 199 (23%)
25-29	32 / 191 (17%)	61 / 199 (31%)
30-39	81 / 191 (42%)	47 / 199 (24%)
40+	47 / 191 (25%)	45 / 199 (23%)
Unknown	9	1
preterm_rapture, n / N (%)			18	<0.001
no	43 / 200 (22%)	82 / 200 (41%)
yes	157 / 200 (79%)	118 / 200 (59%)
¹ Pearson’s Chi-squared test

Comments

Table above shows the frequency and tests for independence between the categorical variable that are more clinical and the outcome variable of a neonate, the first variable is mode and has three categories which are C section, NVD and other.
The test statistic is 89, with a very small \(p-value\) indicating a strong association between the mode of delivery and the outcome variable.
Another variable is gestational age and has two categories: above 32 weeks and below 32 weeks ,the results from the chi-squared test indicate that that there is strong association between gestational age and the outcome variable \((\chi^2=19,p<10001)\).
fluid odour on the other hand has two categories: foul and not foul. The test statistic is \(\chi^2= 0.16)\), with a p-value of 0.69, indicating no significant association between fluid odour and the outcome death of the neonates .
Another variable is parity and has two categories: adequate and high risk, the result show \((\chi^2=0.01, p-value= 0.92)\), indicating no significant association between parity and the outcome event of the neonate.
Birth weight was also considered and has two categories: less than 1.5 Kg and 1.5-2.5 Kg, table shows that \((\chi^2=112, p-value<0.001)\), indicating a significant association between birth weight and the final event outcome of a neonate.
The seventh characteristic, maternal age, has four categories: \(15-25\), \(25-29\), \(30-39\), and \(40+\).
The test statistic is 21, with a very small p-value, indicating a significant association between maternal age and the outcome and the outcome of death.
Preterm rupture, has two categories: yes and no,the test statistic of 18, has a corresponding \(p-value\) \((p<0001)\) indicating a significant association between preterm rupture and death of a neonate.

further exploration on

Out of interest , research shows that babies born preterm have a higher risk of dying

Whats the relationship between death, preterm rapture and mode of delivery

	C section	NVD	other
DIED
no	34.88% (15)	32.56% (14)	32.56% (14)
yes	29.94% (47)	28.66% (45)	41.40% (65)
SURVIVED
no	52.44% (43)	45.12% (37)	2.44% (2)
yes	56.78% (67)	42.37% (50)	0.85% (1)

`C section` vs `DIED==Yes`

	C section	NVD	other
DIED
no	34.88% (15)	32.56% (14)	32.56% (14)
yes	29.94% (47)	28.66% (45)	41.40% (65)
SURVIVED
no	52.44% (43)	45.12% (37)	2.44% (2)
yes	56.78% (67)	42.37% (50)	0.85% (1)

Proportions

#>      n_often n_not_often
#> [1,]      15          28
#> [2,]      47         110

Proportional test for those who died and had C section

#> 2-sample test for equality of proportions
#> 
#> Proportion      | Difference |        95% CI | Chi2(1) |     p
#> --------------------------------------------------------------
#> 34.88% / 29.94% |      4.95% | [-0.12, 0.22] |    0.19 | 0.663
#> 
#> Alternative hypothesis: two.sided

Comments

for our study ,we found a value of 0.19, which is small and definitely statistically insignificant. In a hypothetical world where there’s no difference in the proportions, the probability of seeing a difference of at least \(4.95\) percentage points is super huge (p = 0.663).
We have enough evidence to confidently accept the null hypothesis and declare that the proportions of the groups are are the same. With the confidence interval, we can say that we are 95% confident that the interval [-0.12, 0.22] captures the true population parameter.
however We can’t say that there’s a 95% chance that the true value falls in this range—we can only talk about the range.

Survival Analysis

Instantaneous Hazard

Comments

The Figure above is found after fitting null models for exponential, Cox PH, log-logistic, lognormal, weibull (PH) and weibull (AFT), none of them seemed great.
The three curves Weibull (AFT), Log-logistic and Log-normal seem especially better choices.
We further analyzed goodness of fit for the null models using the Akaike Information criterion .

Comments

Using the AIC, the best fit model appears to be the lognormal model since it has the least AIC value of \(1222.156\) as compared to other models.
Thus the prelimenary analysis of the null models show that the Lognormal AFT and the Weibull (PH) seem to be better fit for the null models.

kaplan Meier Curves

Comments

The mean and median overall survival times for neonates were 4.135 and 3 days, respectively.
The minimum and maximum survival times observed were 1 and 28 days, respectively.
These findings suggest that neonates, on average, experience death after 4 days from birth, with some neonates dying as early as the first day and as late as the 28th day.
the Kaplan-Meier curve showing the probability of neonatal survival and shows the actual survival times and the probability of survival over time.
The Kaplan-Meier cumulative hazard function estimate revealed that the cumulative hazard rises more sharply at around times 12, 23, and 27 days. This information provides insight into the timing and frequency of events and can help to identify potential risk factors for neonatal mortality.

Gender and Color of Amniotic fluid

Comments

The plots above show the difference in survival times for the selected variables. it can be noticed that there is no significant difference in survival times between males and females, this result is also further clarified by the logrank test which will proceed in the coming sections.
The second plot shows that neonates with clear/yellow fluid color have more survival time as compared to the other group and this can also be explained by the log rank test.

kaplan meier estimates gestational age And amniotic fluid odour

Comments

The first plot from figure above shows that there is no significant difference in survival times of neonates with foul and non foul amniotic fluid.
The second plot however reveals that the neonates with gestational age above 32 weeks have a longer survival time as compared to their counterparts with age less that 32 weeks.

Mode of delivery and Birthweight

Comments

It can be noted from the first plot that, babies whose mothers have a Normal Vaginal Delivery (NVD) survived longer as compared to those who had Caesarian section and other modes.
Babies born with weight between 1.5-2.5kg also survived longer than those with weight below 1.5 kg.

Preterm Rapture And Parity

Comments

It can be noted from the first graph that neonates whose mothers did not have a preterm rapture have a longer survival time as compared to those whose mothers had a preterm rapture.
There is however no difference in survival times for the two classes in the parity variable.

Univariate Cox and Univariate Logistic regression

Characteristic	Death Status			Time to Death
Characteristic	N	OR (95% CI)¹	p-value	N	HR (95% CI)¹	p-value
Birth_attended	400			400
Doctor		—			—
Nurse/Midwife		1.55 (0.96 to 2.53)	0.076		1.12 (0.78 to 1.60)	0.54
Traditional		0.41 (0.13 to 1.16)	0.11		0.69 (0.27 to 1.77)	0.44
Parity	400			400
adequate		—			—
High Risk		1.02 (0.68 to 1.52)	0.92		0.85 (0.64 to 1.14)	0.28
gender	400			400
Female		—			—
Male		0.88 (0.59 to 1.32)	0.54		0.82 (0.62 to 1.09)	0.18
BirthWeight	400			400
<1.5Kg		—			—
1.5-2.5Kg		0.07 (0.04 to 0.11)	<0.001		0.33 (0.25 to 0.44)	<0.001
COAF	400			400
clear/yellow		—			—
green/brown		0.47 (0.31 to 0.70)	<0.001		0.70 (0.52 to 0.93)	0.015
AMO	400			400
foul		—			—
not foul		1.09 (0.73 to 1.62)	0.69		1.11 (0.84 to 1.47)	0.46
Mode	400			400
C section		—			—
NVD		1.20 (0.76 to 1.90)	0.42		0.75 (0.52 to 1.08)	0.12
other		46.7 (16.6 to 196)	<0.001		2.90 (2.08 to 4.06)	<0.001
Maternal_age	390			390
15-25		—			—
25-29		0.78 (0.42 to 1.46)	0.43		0.83 (0.50 to 1.36)	0.45
30-39		2.56 (1.44 to 4.60)	0.002		1.21 (0.80 to 1.84)	0.37
40+		1.55 (0.84 to 2.87)	0.16		1.12 (0.71 to 1.76)	0.64
gestational_age	400			400
above 32		—			—
below 32		2.59 (1.68 to 4.04)	<0.001		1.51 (1.08 to 2.13)	0.017
preterm_rapture	400			400
no		—			—
yes		2.54 (1.64 to 3.96)	<0.001		1.49 (1.06 to 2.10)	0.021
HIE	400			400
no		—			—
yes		41.4 (12.7 to 255)	<0.001		15.3 (3.80 to 61.8)	<0.001
¹ OR = Odds Ratio, CI = Confidence Interval, HR = Hazard Ratio, CI = Confidence Interval

Comments

the table above shows individual univariate Cox and logistic regression models
the univariate models help to add more details to the chi-square tests as with the chisquare tests we note the following :
the hazard of death and the odds of dying are lower for babies born with normal birth weight and both the results from Cox and Logistic regression are statistically significant
though obvious , it is interesting to note what data actually proves . Babies born below the normal gestational age are at a higher risk of dying since both the unadjusted odds and hazard ratios are above 1 as compared to the other babies

Multivariate Cox and Logistic Regression

Characteristic	Logistic		Cox PH
Characteristic	HR (95% CI)¹	p-value	OR (95% CI)¹	p-value
Birth_attended
Doctor	—		—
Nurse/Midwife	1.23 (0.82 to 1.82)	0.31	1.57 (0.79 to 3.17)	0.20
Traditional	1.14 (0.43 to 3.00)	0.79	0.30 (0.05 to 1.36)	0.15
Parity
adequate	—		—
High Risk	0.88 (0.64 to 1.22)	0.46	1.28 (0.68 to 2.44)	0.45
gender
Female	—		—
Male	1.06 (0.78 to 1.45)	0.71	1.82 (1.02 to 3.33)	0.046
BirthWeight
<1.5Kg	—		—
1.5-2.5Kg	0.46 (0.31 to 0.67)	<0.001	0.10 (0.04 to 0.23)	<0.001
COAF
clear/yellow	—		—
green/brown	0.77 (0.55 to 1.09)	0.14	0.58 (0.32 to 1.06)	0.077
AMO
foul	—		—
not foul	1.40 (1.00 to 1.96)	0.047	1.40 (0.78 to 2.55)	0.26
Mode
C section	—		—
NVD	0.72 (0.49 to 1.07)	0.10	1.09 (0.60 to 1.98)	0.78
other	1.59 (1.05 to 2.40)	0.028	18.6 (4.47 to 134)	<0.001
Maternal_age
15-25	—		—
25-29	0.79 (0.48 to 1.31)	0.36	0.75 (0.31 to 1.81)	0.53
30-39	0.85 (0.54 to 1.33)	0.48	2.38 (1.02 to 5.68)	0.048
40+	0.83 (0.51 to 1.37)	0.47	2.01 (0.83 to 4.93)	0.12
gestational_age
above 32	—		—
below 32	1.31 (0.92 to 1.87)	0.14	3.05 (1.62 to 5.96)	<0.001
HIE
no	—		—
yes	9.54 (2.34 to 38.8)	0.002	111 (18.8 to 1,397)	<0.001
¹ HR = Hazard Ratio, CI = Confidence Interval, OR = Odds Ratio, CI = Confidence Interval

Model diagnostics for Cox Regression

Deviance Residuals

Dfbeta

Comments

The figure above shows the estimated changes that are found in the regression coefficients that are observed from removing each observation one after the other.
We compare size in values of the largest dfbetas values to the regression coefficients and as seen from the plot above none of the observations is very influential.

Martingale and Deviance Residuals

Comments

one of the plots shows deviance residuals which are normalized transformations of the Martingale residuals.
Deviance residuals should be symmetrically distributed about zero as shown in the plot with a standard deviation of 1 for the proportional hazard assumption to hold.

Parametric models

Univariate Weibull model

Characteristic	N	exp(Beta) (95% CI)¹	p-value
Birth_attended	400
Doctor		—
Nurse/Midwife		-0.07 (-0.35 to 0.21)	0.63
Traditional		0.23 (-0.49 to 0.96)	0.53
Parity	400
adequate		—
High Risk		0.14 (-0.08 to 0.35)	0.22
gender	400
Female		—
Male		0.17 (-0.05 to 0.38)	0.13
BirthWeight	400
<1.5Kg		—
1.5-2.5Kg		0.79 (0.56 to 1.02)	<0.001
COAF	400
clear/yellow		—
green/brown		0.25 (0.02 to 0.47)	0.034
AMO	400
foul		—
not foul		-0.09 (-0.30 to 0.13)	0.44
Mode	400
C section		—
NVD		0.27 (0.00 to 0.55)	0.049
other		-0.70 (-0.96 to -0.44)	<0.001
Maternal_age	390
15-25		—
25-29		0.17 (-0.22 to 0.55)	0.39
30-39		-0.09 (-0.42 to 0.23)	0.57
40+		-0.04 (-0.40 to 0.31)	0.81
gestational_age	400
above 32		—
below 32		-0.29 (-0.56 to -0.02)	0.036
preterm_rapture	400
no		—
yes		-0.28 (-0.55 to -0.01)	0.043
HIE	400
no		—
yes		-2.12 (-3.27 to -0.97)	<0.001
¹ CI = Confidence Interval

Multivariate Weibul PH model

Independent Variables	N	exp(Beta) (95% CI)¹	p-value
Weibull model
(Intercept)	390	3.45 (2.19 to 4.70)	<0.001
Birth_attended	390
Doctor		—
Nurse/Midwife		-0.15 (-0.46 to 0.15)	0.33
Traditional		-0.21 (-0.95 to 0.54)	0.59
Parity	390
adequate		—
High Risk		0.11 (-0.14 to 0.36)	0.39
gender	390
Female		—
Male		-0.03 (-0.28 to 0.21)	0.78
BirthWeight	390
<1.5Kg		—
1.5-2.5Kg		0.56 (0.26 to 0.85)	<0.001
COAF	390
clear/yellow		—
green/brown		0.17 (-0.10 to 0.44)	0.21
AMO	390
foul		—
not foul		-0.26 (-0.52 to 0.00)	0.054
Mode	390
C section		—
NVD		0.30 (0.01 to 0.60)	0.045
other		-0.30 (-0.62 to 0.02)	0.064
Maternal_age	390
15-25		—
25-29		0.21 (-0.18 to 0.60)	0.29
30-39		0.16 (-0.18 to 0.50)	0.36
40+		0.21 (-0.16 to 0.59)	0.27
gestational_age	390
above 32		—
below 32		-11.8 (-5,298 to 5,274)	>0.99
preterm_rapture	390
no		—
yes		11.7 (-5,274 to 5,297)	>0.99
HIE	390
no		—
yes		-1.70 (-2.80 to -0.60)	0.002
iter = 15.0; df = 17; Statistic = 108; Log-likelihood = -532; AIC = 1,098; BIC = 1,165; Residual df = 373; No. Obs. = 390; p-value = <0.001
¹ CI = Confidence Interval

Multivariate lognormal PH model

Independent Variables	N	exp(Beta) (95% CI)¹	p-value
Lognormal model
(Intercept)	390	2.52 (1.68 to 3.37)	<0.001
Birth_attended	390
Doctor		—
Nurse/Midwife		-0.01 (-0.30 to 0.28)	0.94
Traditional		0.09 (-0.57 to 0.75)	0.79
Parity	390
adequate		—
High Risk		0.15 (-0.10 to 0.40)	0.25
gender	390
Female		—
Male		-0.08 (-0.32 to 0.16)	0.51
BirthWeight	390
<1.5Kg		—
1.5-2.5Kg		0.57 (0.29 to 0.86)	<0.001
COAF	390
clear/yellow		—
green/brown		0.18 (-0.09 to 0.45)	0.19
AMO	390
foul		—
not foul		-0.25 (-0.51 to 0.02)	0.069
Mode	390
C section		—
NVD		0.31 (0.03 to 0.59)	0.032
other		-0.37 (-0.70 to -0.04)	0.030
Maternal_age	390
15-25		—
25-29		0.25 (-0.13 to 0.63)	0.20
30-39		0.14 (-0.20 to 0.49)	0.41
40+		0.18 (-0.19 to 0.56)	0.34
gestational_age	390
above 32		—
below 32		-4.53 (-1,148 to 1,138)	>0.99
preterm_rapture	390
no		—
yes		4.24 (-1,139 to 1,147)	>0.99
HIE	390
no		—
yes		-1.24 (-1.87 to -0.60)	<0.001
iter = 12.0; df = 17; Statistic = 125; Log-likelihood = -521; AIC = 1,077; BIC = 1,144; Residual df = 373; No. Obs. = 390; p-value = <0.001
¹ CI = Confidence Interval

Multivariate exponential PH model

Independent Variables	N	exp(Beta) (95% CI)¹	p-value
exponential model
(Intercept)	390	4.19 (2.60 to 5.78)	<0.001
Birth_attended	390
Doctor		—
Nurse/Midwife		-0.17 (-0.56 to 0.22)	0.40
Traditional		-0.02 (-0.98 to 0.94)	0.97
Parity	390
adequate		—
High Risk		0.09 (-0.23 to 0.42)	0.56
gender	390
Female		—
Male		-0.07 (-0.38 to 0.23)	0.64
BirthWeight	390
<1.5Kg		—
1.5-2.5Kg		0.69 (0.32 to 1.06)	<0.001
COAF	390
clear/yellow		—
green/brown		0.22 (-0.12 to 0.56)	0.21
AMO	390
foul		—
not foul		-0.29 (-0.62 to 0.04)	0.085
Mode	390
C section		—
NVD		0.31 (-0.07 to 0.69)	0.11
other		-0.38 (-0.79 to 0.02)	0.065
Maternal_age	390
15-25		—
25-29		0.22 (-0.28 to 0.72)	0.39
30-39		0.10 (-0.34 to 0.55)	0.64
40+		0.16 (-0.33 to 0.64)	0.53
gestational_age	390
above 32		—
below 32		-15.2 (-5,858 to 5,827)	>0.99
preterm_rapture	390
no		—
yes		15.0 (-5,828 to 5,858)	>0.99
HIE	390
no		—
yes		-2.33 (-3.73 to -0.93)	0.001
iter = 15.0; df = 16; Statistic = 109; Log-likelihood = -543; AIC = 1,118; BIC = 1,181; Residual df = 374; No. Obs. = 390; p-value = <0.001
¹ CI = Confidence Interval

Multivariate loglogistic PH model

Independent Variables	N	exp(Beta) (95% CI)¹	p-value
loglogistic model
(Intercept)	390	2.61 (1.63 to 3.58)	<0.001
Birth_attended	390
Doctor		—
Nurse/Midwife		0.02 (-0.29 to 0.33)	0.90
Traditional		0.05 (-0.61 to 0.71)	0.89
Parity	390
adequate		—
High Risk		0.19 (-0.07 to 0.45)	0.16
gender	390
Female		—
Male		-0.12 (-0.37 to 0.13)	0.34
BirthWeight	390
<1.5Kg		—
1.5-2.5Kg		0.58 (0.28 to 0.89)	<0.001
COAF	390
clear/yellow		—
green/brown		0.19 (-0.09 to 0.46)	0.19
AMO	390
foul		—
not foul		-0.25 (-0.52 to 0.02)	0.067
Mode	390
C section		—
NVD		0.29 (-0.01 to 0.59)	0.057
other		-0.45 (-0.80 to -0.11)	0.010
Maternal_age	390
15-25		—
25-29		0.24 (-0.15 to 0.64)	0.23
30-39		0.19 (-0.16 to 0.55)	0.29
40+		0.19 (-0.19 to 0.58)	0.32
gestational_age	390
above 32		—
below 32		-8.50 (-3,920 to 3,903)	>0.99
preterm_rapture	390
no		—
yes		8.22 (-3,903 to 3,920)	>0.99
HIE	390
no		—
yes		-1.33 (-2.13 to -0.53)	0.001
iter = 14.0; df = 17; Statistic = 127; Log-likelihood = -527; AIC = 1,088; BIC = 1,155; Residual df = 373; No. Obs. = 390; p-value = <0.001
¹ CI = Confidence Interval

Comparing PH models

Comments

it can be noted that the Lognormal PH model has the least Bayesian information and Akaike information Criterion hence can be chosen as the best model among the PH Models.
The loglikelihood also further confirms that the Lognormal is the best fitting model since it is smallest for the Lognormal model.
Overally it means that the Lognormal model is the best fitting among the parametric hazard models for the given dataset.

Accelerated failure time MODELS

Weibull AFT

#> Call:
#> aftreg(formula = Surv(days, DEATH) ~ Birth_attended + Parity + 
#>     gender + BirthWeight + COAF + AMO + Birth_attended + Mode + 
#>     Maternal_age + gestational_age + preterm_rapture + HIE, data = out_new, 
#>     dist = "weibull")
#> 
#> Covariate          W.mean      Coef Time-Accn  se(Coef)    Wald p
#> Birth_attended 
#>           Doctor    0.207     0         1           (reference)
#>    Nurse/Midwife    0.751     0.153     1.166     0.156     0.326 
#>      Traditional    0.042     0.207     1.230     0.380     0.586 
#> Parity 
#>         adequate    0.378     0         1           (reference)
#>        High Risk    0.622    -0.110     0.896     0.127     0.387 
#> gender 
#>           Female    0.387     0         1           (reference)
#>             Male    0.613     0.035     1.035     0.123     0.779 
#> BirthWeight 
#>           <1.5Kg    0.331     0         1           (reference)
#>        1.5-2.5Kg    0.669    -0.555     0.574     0.149     0.000 
#> COAF 
#>     clear/yellow    0.572     0         1           (reference)
#>      green/brown    0.428    -0.171     0.843     0.136     0.209 
#> AMO 
#>             foul    0.601     0         1           (reference)
#>         not foul    0.399     0.256     1.291     0.133     0.054 
#> Mode 
#>        C section    0.344     0         1           (reference)
#>              NVD    0.473    -0.301     0.740     0.150     0.045 
#>            other    0.184     0.299     1.349     0.161     0.063 
#> Maternal_age 
#>            15-25    0.172     0         1           (reference)
#>            25-29    0.221    -0.209     0.811     0.198     0.290 
#>            30-39    0.370    -0.161     0.851     0.175     0.355 
#>              40+    0.237    -0.214     0.807     0.192     0.265 
#> gestational_age 
#>         above 32    0.301     0         1           (reference)
#>         below 32    0.699     6.446   630.254    80.932     0.937 
#> preterm_rapture 
#>               no    0.298     0         1           (reference)
#>              yes    0.702    -6.274     0.002    80.932     0.938 
#> HIE 
#>               no    0.139     0         1           (reference)
#>              yes    0.861     1.705     5.501     0.561     0.002 
#> 
#> Baseline parameters:
#> log(scale)                    3.447               0.640     0.000 
#> log(shape)                    0.262               0.052     0.000 
#> Baseline life expectancy:  29 
#> 
#> Events                    191 
#> Total time at risk          1602 
#> Max. log. likelihood      -531.83 
#> LR test statistic         108 
#> Degrees of freedom        15 
#> Overall p-value           3.33067e-16

lognormal AFT

#> Call:
#> aftreg(formula = Surv(days, DEATH) ~ Birth_attended + Parity + 
#>     gender + BirthWeight + COAF + AMO + Birth_attended + Mode + 
#>     Maternal_age + gestational_age + preterm_rapture + HIE, data = out_new, 
#>     dist = "lognormal")
#> 
#> Covariate          W.mean      Coef Time-Accn  se(Coef)    Wald p
#> Birth_attended 
#>           Doctor    0.207     0         1           (reference)
#>    Nurse/Midwife    0.751     0.012     1.012     0.150     0.938 
#>      Traditional    0.042    -0.088     0.916     0.336     0.793 
#> Parity 
#>         adequate    0.378     0         1           (reference)
#>        High Risk    0.622    -0.146     0.864     0.127     0.251 
#> gender 
#>           Female    0.387     0         1           (reference)
#>             Male    0.613     0.079     1.082     0.120     0.512 
#> BirthWeight 
#>           <1.5Kg    0.331     0         1           (reference)
#>        1.5-2.5Kg    0.669    -0.574     0.563     0.146     0.000 
#> COAF 
#>     clear/yellow    0.572     0         1           (reference)
#>      green/brown    0.428    -0.182     0.834     0.138     0.189 
#> AMO 
#>             foul    0.601     0         1           (reference)
#>         not foul    0.399     0.246     1.279     0.135     0.069 
#> Mode 
#>        C section    0.344     0         1           (reference)
#>              NVD    0.473    -0.310     0.733     0.145     0.032 
#>            other    0.184     0.367     1.444     0.170     0.030 
#> Maternal_age 
#>            15-25    0.172     0         1           (reference)
#>            25-29    0.221    -0.250     0.779     0.196     0.201 
#>            30-39    0.370    -0.144     0.866     0.174     0.409 
#>              40+    0.237    -0.184     0.832     0.192     0.339 
#> gestational_age 
#>         above 32    0.301     0         1           (reference)
#>         below 32    0.699     3.851    47.036   101.473     0.970 
#> preterm_rapture 
#>               no    0.298     0         1           (reference)
#>              yes    0.702    -3.566     0.028   101.473     0.972 
#> HIE 
#>               no    0.139     0         1           (reference)
#>              yes    0.861     1.239     3.451     0.324     0.000 
#> 
#> Baseline parameters:
#> log(scale)                    2.520               0.431     0.000 
#> log(shape)                    0.101               0.050     0.044 
#> Baseline life expectancy:  NA 
#> 
#> Events                    191 
#> Total time at risk          1602 
#> Max. log. likelihood      -521.27 
#> LR test statistic         125 
#> Degrees of freedom        15 
#> Overall p-value           0

loglogistic AFT

#> Call:
#> aftreg(formula = Surv(days, DEATH) ~ Birth_attended + Parity + 
#>     gender + BirthWeight + COAF + AMO + Birth_attended + Mode + 
#>     Maternal_age + gestational_age + preterm_rapture + HIE, data = out_new, 
#>     dist = "loglogistic")
#> 
#> Covariate          W.mean      Coef Time-Accn  se(Coef)    Wald p
#> Birth_attended 
#>           Doctor    0.207     0         1           (reference)
#>    Nurse/Midwife    0.751    -0.020     0.980     0.158     0.900 
#>      Traditional    0.042    -0.048     0.953     0.337     0.886 
#> Parity 
#>         adequate    0.378     0         1           (reference)
#>        High Risk    0.622    -0.187     0.829     0.133     0.158 
#> gender 
#>           Female    0.387     0         1           (reference)
#>             Male    0.613     0.122     1.129     0.126     0.335 
#> BirthWeight 
#>           <1.5Kg    0.331     0         1           (reference)
#>        1.5-2.5Kg    0.669    -0.583     0.558     0.155     0.000 
#> COAF 
#>     clear/yellow    0.572     0         1           (reference)
#>      green/brown    0.428    -0.186     0.831     0.141     0.189 
#> AMO 
#>             foul    0.601     0         1           (reference)
#>         not foul    0.399     0.253     1.288     0.138     0.067 
#> Mode 
#>        C section    0.344     0         1           (reference)
#>              NVD    0.473    -0.290     0.748     0.152     0.057 
#>            other    0.184     0.452     1.572     0.176     0.010 
#> Maternal_age 
#>            15-25    0.172     0         1           (reference)
#>            25-29    0.221    -0.240     0.786     0.201     0.233 
#>            30-39    0.370    -0.194     0.824     0.183     0.289 
#>              40+    0.237    -0.195     0.823     0.197     0.324 
#> gestational_age 
#>         above 32    0.301     0         1           (reference)
#>         below 32    0.699     5.765   319.023   149.609     0.969 
#> preterm_rapture 
#>               no    0.298     0         1           (reference)
#>              yes    0.702    -5.494     0.004   149.609     0.971 
#> HIE 
#>               no    0.139     0         1           (reference)
#>              yes    0.861     1.328     3.774     0.408     0.001 
#> 
#> Baseline parameters:
#> log(scale)                    2.606               0.496     0.000 
#> log(shape)                    0.642               0.057     0.000 
#> Baseline life expectancy:  NA 
#> 
#> Events                    191 
#> Total time at risk          1602 
#> Max. log. likelihood      -526.83 
#> LR test statistic         127 
#> Degrees of freedom        15 
#> Overall p-value           0

Comparing Accelerated failure time models

Comments

When the three accelerated failure time models were weighed and contrasted using AIC, BIC and log likelihood ratio, the one with the smallest value of AIC and BIC were considered the best among the other models.
From Table above, the Lognormal AFT model can be regarded to be a suitable AFT model for the data since it has the least AIC and BIC values when contrasted with other AFT models. Thus the Lognormal AFT \((AIC=1076.545,BIC=-584.0072)\) was a better fit among all contrasted AFT models.

Compare models graphically

Overal Best Model

The log-likelihood ratios and AIC were used to choose the model that was overal best for the data.
Table above displays the log-likelihood ratios and Akaike’s Information Criterion (AIC) for the best PH and AFT models.
Among the desired models, the best and most efficient model is the one with the lowest AIC value.
By comparing the AIC values, it was observed that the Lognormal model was the overall best-fitted model for the data.
Therefore, the data was best fitted using the Lognormal Accelerated Failure Time model.

Introduction

The objectives of the study are to:

General survival analysis ideas

Censoring

Survival function

Relationship between h(t) and S(t)

Non-parametric methods

Empirical Survival Function

Kaplan-Meier estimate

Log rank test

Parametric survival distributions

Regression models in Survival analysis

Cox’s Proportional Hazards

Fitting a proportional hazards model

The partial likelihood

Inference on the hazard ratios

Residuals for Cox Regression Model

Cox-Snell ResidualS

Martingale residuals

Deviance Residuals

Schoenfield residuals

Parametric Proportional Hazard Models

Weibull PH model

The Exponential PH model

Accelerated Failure Time Models

Weibull AFT model

log-logistic AFT model

Lognormal AFT model

Model Checking

Akaike Information Criterion

Bayesian Information Criterion

Variable selection

Stepwise regression

Backward elimination

data wrangling

Demographics

Clinical factors

further exploration on

C section vs DIED==Yes

Proportions

Proportional test for those who died and had C section

Survival Analysis

Instantaneous Hazard

kaplan Meier Curves

Gender and Color of Amniotic fluid

kaplan meier estimates gestational age And amniotic fluid odour

Mode of delivery and Birthweight

Preterm Rapture And Parity

Univariate Cox and Univariate Logistic regression

Multivariate Cox and Logistic Regression

Model diagnostics for Cox Regression

Deviance Residuals

Dfbeta

Martingale and Deviance Residuals

Parametric models

Univariate Weibull model

Multivariate Weibul PH model

Multivariate lognormal PH model

Multivariate exponential PH model

Multivariate loglogistic PH model

Comparing PH models

Accelerated failure time MODELS

Weibull AFT

lognormal AFT

loglogistic AFT

Comparing Accelerated failure time models

Compare models graphically

Overal Best Model

`C section` vs `DIED==Yes`